Mutual Information Based Matching for Causal Inference with Observational Data

نویسندگان

  • Lei Sun
  • Alexander G. Nikolaev
چکیده

This paper presents an information theory-driven matching methodology for making causal inference from observational data. The paper adopts a “potential outcomes framework” view on evaluating the strength of cause-effect relationships: the population-wide average effects of binary treatments are estimated by comparing two groups of units – the treated and untreated (control). To reduce the bias in such treatment effect estimation, one has to compose a control group in such a way that across the compared groups of units, treatment is independent of the units’ covariates. This requirement gives rise to a subset selection / matching problem. This paper presents the models and algorithms that solve the matching problem by minimizing the mutual information (MI) between the covariates and the treatment variable. Such a formulation becomes tractable thanks to the derived optimality conditions that tackle the non-linearity of the sample-based MI function. Computational experiments with mixed integer-programming formulations and four matching algorithms demonstrate the utility of MI based matching for causal inference studies. The algorithmic developments culminate in a matching heuristic that allows for balancing the compared groups in polynomial (close to linear) time, thus allowing for treatment effect estimation with large data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

In a randomized study, subjects are randomly assigned to either a treated group or a control group. Random assignment ensures that the distribution of the covariates is the same in both groups and that the treatment effect can be estimated by directly comparing the outcomes for the subjects in the two groups. In contrast, subjects in an observational study are not randomly assigned. In order to...

متن کامل

Matching and Propensity Scores

The popularity of matching techniques has increased considerably during the last decades. They are mainly used for matching treatment and control units in order to estimate causal treatment effects from observational studies or for integrating two or more data sets that share a common subset of covariates. In focusing on causal inference with observational studies, we discuss multivariate match...

متن کامل

Multiple Imputation for Causal Inference

The potential outcome framework for causal inference is fundamentally a missing data problem with a special, the so-called file-matching, pattern of missing data. Given the large body of literature on various methods for handling missing data and associated software, it will be useful to use such methods to facilitate causal inference for routine applications. This article uses the sequential r...

متن کامل

Matching Methods for High-Dimensional Data with Applications to Text∗

Matching is a popular technique for preprocessing observational data to facilitate causal inference and reduce model dependence by ensuring that treated and control units are balanced along pre-treatment covariates. While most applications of matching balance on a small number of covariates, we identify situations where matching with thousands of covariates may be desirable, such as causal infe...

متن کامل

Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Despite their appeal, randomized experiments cannot always be conducted, for example, due to ethical or practical reasons. In order to remove selection bias and draw causal inferences from observational data, propensity score matching techniques have gained increased popularity during the past three decades. Although propensity score methods have been studied extensively for single-level data, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016